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Water quality is one of the most highly debated issues worldwide at the 
moment. Inadequate water supplies affect human health, hinder food 
production, and degrade the environment. Using contemporary technology to 
analyze pollution statistics can help solve pollution issues. One option is to 
take advantage of advancements in intelligent data processing to conduct 
hydrological parameter analysis. To perform conclusive water quality 
studies, a lot of data is necessary. Unfilled data (information gaps) in the 
long-term hydrological data set may be due to equipment faults, collection 
schedule delays, or the data collection officer’s absence. The lack of 
hydrological data skews its interpretation. Therefore, interpolation is used to 
recreate and fill missing hydrological data. From 2012 to 2017, the Klang 
River’s biochemical oxygen demand (BOD) in Selangor, Malaysia, was 
sampled. This study examined three methods of interpolation for their 
effectiveness using the MATLAB software: piecewise cubic hermite 
interpolating polynomial (PCHIP), cubic Spline data interpolation (Spline), 
and modified Akima partitioned cubic hermite interpolation (Makima). The 
accuracy is assessed using root mean square error (RMSE). All interpolation 
algorithms offer excellent results with low RMSE. However, PCHIP delivers 
the best match between interpolated and original data. 
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1. INTRODUCTION 


Over the years, numerous international gatherings have focused exclusively on water quality. It has 
become an endless topic of discussion throughout the world, as degraded water quality has a detrimental effect 
on social and economic development. Malaysia, too, is not immune to this problem of water pollution. Utilizing 
the Klang River as an example, it flows for approximately 120 kilometers through the state of Selangor and 
crosses directly through the heart of Malaysia’s capital city, Kuala Lumpur [1]. The Selangor-based Klang 
River received tributaries from 11 significant rivers to compound matters. The physico-chemical parameters 
contributing to water pollution are the starting point for calculating the water quality index [2], [3]. Biochemical 
oxygen demand (BOD), chemical oxygen demand (COD), dissolved oxygen (DO), total suspended solid (TSS), 
and turbidity are just a few examples of these parameters [4]. In Malaysia, the department of environment 
(DOE), the country’s environment ministry, provides water quality metrics. To analyze the BOD pattern, the 
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hydrological data were received from DOE for the Klang River from 2012 to 2017. Unfortunately, DOE did not 
distribute this data consistently. Several were considerably lacking due to various causes, including sensor 
problems, disruptions in the data collection schedule, and the officers’ unavailability at the time. There were 
only 164 data readings for each type of water parameter during this period, including BOD. 

In contrast, based on the number of days in six years, there should have been more than 1000 data 
collected from the Klang River during this period. For additional technical applications, such as forecasting and 
prediction procedures based on machine learning techniques like artificial neural networks (ANN), the data 
should be arranged in an equivalent order on a daily, monthly, or annual basis to enable the study of pollutant 
data features. The data must be validated and undergo corrective action and improvement before being used in 
more complex systems. There are two approaches to resolving this issue: manual drawing skills or 
computational mathematical algorithms [5]. Typically, a mathematical technique called ‘interpolation’ is 
advised [6]. The ‘extrapolation’ method is commonly used when predicting extension data outside the known 
data range [7]. Spatial interpolation is a frequently used technique to estimate values at unknown data points in a 
continuous domain area using available data [8], [9]. Spatial interpolation often employs a range of different 
methodologies and solutions, depending on the available information [10]. A recent study shows about 38 
different spatial interpolation algorithms are usually used [11]. This study applied the interpolation technique to 
real-world data problems using the piecewise cubic hermite interpolating polynomial (PCHIP) [12]. The results 
were compared to those obtained using cubic Splines piecewise linear functions. The comparison determines the 
best interpolation model for approximating generic small-arms projectile aerodynamics. The test also 
demonstrates the performance and variants of polynomial functions in terms of interval and coefficients. 

Interpolation has also been used in signal processing [13]. It is used to distinguish between 
background noise and non-stationary signals. Barker and McDougall [14] successfully tested two 
interpolation methods using hydrographic data. Typically, oceanic hydrographic data include the temperature 
and salinity of the ocean water. The measurements are done at varied pressure rates. Interpolation is critical 
in this case, as researchers usually adjust the interval to normal pressure. Apart from it is use in the field of 
hydrography, the interpolation technique is also very useful for determining projectile aerodynamics. This 
has been proven by Rabbath and Corriveau [12] in determining ballistic projectile aerodynamics based on a 
discrete set of mach numbers. At the end of their study, they found that the interpolation technique gave 
significant results against the prediction of the original ballistic projectile. Interpolation techniques have also 
been shown to be often used in real-world data problems where Abdulrahman et al. [15] has made an 
assessment of surface area with height using global navigation satellite systems (GNSS) and geographical 
information systems (GIS) techniques. To test the best technique to be used, the researcher has tested the data 
obtained using different interpolation techniques namely inverse distance weighting (IDW), ordinary kriging, 
and local polynomial interpolation (LPI). The results found that overall, the use of the interpolation technique 
gives a small root mean square (RMS) error rate and produces a value close to the actual value. In addition, 
there are many other examples of the use of interpolation techniques in real data problems as done by 
Wu et al. [16] in his study to see the relationship between weather patterns and climatological dynamics in 
the field of meteorology. The interpolation technique is used to overcome the problem of uneven sampling 
points other than the random distribution that is commonly done in practical applications. They found that the 
use of interpolation technique successfully provided the best fitting between the reading value and the 
expected value obtained. Another example of the successful use of this interpolation technique is through 
another author named Wu et al. [17] whose study involved previous bathymetry hydrographic data compared 
with expected interpolation data in the Mississippi river. They found that with more data sample points, then 
the expected results of bathymetry data would be better [9], [18]. 

In this work, the comparison of spatial interpolation methods to assess the water quality based on 
the BOD as one of the physico-chemical parameters for the Klang River is examined. MATLAB software 
was utilized to carry out the interpolation operation. MATLAB software includes three interpolation 
algorithms that can be employed, notably PCHIP, cubic spline interpolation (Spline), and modified Akima 
piecewise cubic hermite interpolation (Makima). Klang River’s missing hydrological data gap is recreated 
and filled based on the software’s capabilities employed with the appropriateness technique. The three 
interpolation techniques were used for comparative analysis to determine which method is the most accurate 
in filling the gaps in the Klang River data received from the DOE. The interpolation techniques’ finding is 
based on the hydrological data obtained from 2012 to 2017. 


2. RESEARCH METHOD 

The discussion in this section will be focused on the data that was utilized as samples, as well as the 
testing procedures that were used. The mathematical equations that were employed in this investigation are 
also presented. 
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2.1. Data acquisition 

Klang River is situated in the state of Selangor in Malaysia, and it is the fourth largest river basin in 
the state with an area of around 1,290 square km [19]. Approximately 40 km long from this river flows in the 
federal region of Kuala Lumpur, Malaysia’s capital [20]. Figure 1(a) displays the map of the Klang River that 
flows together with the two states in Malaysia, which are Selangor and Kuala Lumpur, respectively. 

On the other hand, Figure 1(b) depicts the locations of the DOE Malaysia-controlled water sampling 
stations along the Klang River. The sampling station chosen for this work is 1K06 (in Petaling Bahagia, 
Kuala Lumpur), situated between this river’s upstream and downstream reaches, as illustrated in the latter 
figure. The BOD, COD, DO, ammoniacal nitrogen, suspended solids, dissolved solids, temperature, pH, and 
turbidity have all been sampled from this station. Only BOD will be tested in this study using the 
interpolation method. It was chosen because BOD can determine the level of pollution caused by organic 
matter in river water [21]. MATLAB 2020 is used to perform the interpolation techniques for all tests. 
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Figure 1. Klang River flow network (a) through the state of Selangor and Kuala Lumpur, Malaysia [22] and 
(b) the water sampling station along the river basin [2] 


2.2. Interpolation technique comparison 

Spatial interpolation is frequently used in time series analysis when observation data lack complete 
information or gaps between them or continuous data measurements unavailable at a specific time, date, or 
location [23]. Interpolation is frequently used to convert numeric data to a constant function. Precautions 
should be taken when calculating existing data and data to be determined, as this can result in significant 
errors [6]. A theoretical review is required to determine the most appropriate technique, either interpolation 
or extrapolation. Numerous scientific studies have used interpolation to solve spatial and temporal problems, 
but few in water quality [23]. 

The Spline method is a frequently used method for interpolation. The Spline is a piecewise 
polynomial function. Each data set subjected to interpolation must keep the monotone nature of the original 
data set [24]. This is done to retain the original signal’s concave and convexity. Each signal function is often 
constrained by a ti < t (i + 1) data point knot, requiring specific procedures to be satisfied before 
proceeding to the following step. Simultaneously, for each Spline with a derivative angle ‘a’, a degree angle 
of ‘a — 1’ is required to estimate the slope of each link. This derivative function is necessary along with the 
data point knot connection on cubic Splines using a grade 3 polynomial Spline. 


2.2.1. Cubic spline interpolation 

Cubic Spline Interpolation is a fundamental polynomial cubic Spline of grade 3. Each Spline has its 
weighted attachment to the data connection point [25]. This is referred to as Spline derivatives [26]. The 
weights or coefficients of the cubic polynomial are used to interpolate the data. Bending the data coefficients 
ensures that the graph line can pass through each data point without exhibiting any unpredictable behaviour 
[27]. The cubic Spline interpolation mathematical expression is as shown in: 


filx) = B3(x — x)? + Ba (x — x)? + BiG — xi) + Bo (1) 


Bulletin of Electr Eng & Inf, Vol. 11, No. 4, August 2022: 2368-2377 


Bulletin of Electr Eng & Inf ISSN: 2302-9285 O 2371 


where the cubic Spline has the function of /(x;) = Y;. The function interpolating the partition of xo < x; < ... 
<Xn-1. The polynomial function consists of n-1 cubic polynomial f; defined in [x;,xi-1]. The function of fi is 
join at (i =/,..., n-2). 


2.2.2. Piecewise cubic hermite interpolating polynomial 

The PCHIP frequently interpolates tightly around the data points. It usually does not overshoot 
unrealistically [28]. It always bordered around the data points. The cubic polynomial of PCHIP is determined 
using the data points knots values and the value of the derivative being set at the data points knots. The 
piecewise cubic hermite interpolation has proven to be more durable, accurate, and adaptable [29]. The 
mathematical expression of the cubic hermite interpolation is given as in (2). 


P(x) = hy +(x — xih, + (x — x;)?h; + (x — xi)? hy, (2) 
where x; < X < Xi+1, 


3C, 1-fis1-2fi 2C, 1- fiy- fi 
its fit1-2fi iss fits fi 


hi=f, he f, hy = why = 


(3) 


The data point is given by fj, f, is the slope at the value of x;for 1<= I <=n. 


2.2.3. Makima interpolation 
Makima is a technique that evolved from Akima [30]. Originally, Akima was based on a cubic 
polynomial. Makima interpolation employs the same method as Akima but has additional elements to ensure 
that the slope end area has a value close to zero [27]. This eliminates overshoot and undershoot issues and 
strengthens signal prediction. The modified Akima piecewise cubic hermite interpolation is expressed 
mathematically (4). 
S1 


di =—+-6,,+ 6, (4) 


S1 +s2 
where, 


[6j-1 + 9-21 


lôi+1+ôil 
a 2 


sı = [6i41 — ôl + 3 


Sz = |ôi-1 — 6-21 + (5) 

In (4) and (5), di is the derivative value, ô; Ôi, Ôi-2 Ôi+1, Ôi+2is the slope value and s; is the slope edge. 
When 6; and 6;-;=0, will force s; into 0. This prevents overshoot when the data node line contains unchanged 
data compared to the original Akima equation. 

The most critical feature following the interpolation process is ensuring that the shape of the new data 
does not significantly alter compared to the original data. Correlation tests cannot be done on newly created data 
between two existing points since they require the inclusion of new data. Correlation tests should compare the 
original and interpolated data sets based on the same total amount of data. Regrettably, this work’s overall 
actual data is smaller than the total number of interpolated data. As a result, the correlation test is ruled out to 
compare accuracy. Each curve and line that supports the original data curve will have its values following the 
interpolation procedure. The peak curve and valley of the graph are noticed and compared to the positions of the 
initial values. This is accomplished by performing the root mean square error (RMSE) test on the peak curve 
and also on the valleys of the graph. The peaks of the original graph represented the original BOD data point, 
while the valleys of the original chart represented the original BOD data point. 


3. RESULTS AND DISCUSSION 

Interpolation techniques compensate for the missing data, resulting in daily data for each day over 
six years. Figure 2 depicts the graph’s original shape before the interpolation process. The initial data from 
DOE Malaysia contained only 164 BOD parameter points and was not collected daily. These data will 
produce data acquisition time intervals relatively far apart for six years. 

The graph’s position is adjusted to correspond to the original day’s when the data was retrieved to 
obtain accurate interpolation data. Figure 3 depicts the BOD data in its original location from the 222™ day in 
2012 to the 1902" day in 2017. As illustrated in Figure 3, the graph retains its original shape. Still, the actual 
data points have been relocated to the correct day point over the six years. Additionally, notice how the 
number of days in the interpolated data increased from 164 in Figure 2 to 1902 in Figure 3. 
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Figure 2. Original shape of BOD data from Klang River Figure 3. The original position of BOD data in 
with only 164 data points in six consecutive years 1902 days 


3.1. Result analysis 


Following interpolation algorithms (PCHIP, Spline, and Makima), 1681 data points are created 


throughout the same period. The results of the interpolation of BOD data are depicted in Figures 4(a)-(c). The 
graphs in Figures 4(a)-(c) depict the data after it has been interpolated using the PCHIP, Spline, and Makima 
approaches. The resulting graph for the interpolated data is almost identical to the original data graph 
presented in Figure 3. The graph in Figure 5 depicts the peak and valley locations for the original BOD data. 
The data will then be marked in order to compare the change in location position between the original BOD 


data and the BOD data after interpolation. 
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Figure 4. The graph result of interpolated BOD data using (a) PCHIP, (b) Spline, and (c) Makima 
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Figure 6 displays the aggregation of graphs between the original BOD data and interpolation data 
utilizing PCHIP, Spline, and Makima algorithms. By looking at this graph, the differences between the three 
procedures and also the shape of the original graph can be seen. Basically, the curvature of the original graph 
is generally still kept, but there is still little difference between the actual pattern and the pattern after the 
interpolation procedure is carried out. This is owing to the specific traits found in each interpolation 


technique. 
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Figure 5. The peak and valleys point from the 
original data points 
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Figure 6. The original data's peaks and valleys were 
compared to those of the three interpolated data sets 


To assess the difference between the actual and interpolated graphs, tests on the original graph's 
peak curves and valleys are performed. This is done to ensure that they all adhere to the initial curvature. 
Figure 6 illustrates the peaks and valleys of each graph. These indicators make it easy to tell the difference 
between the actual and interpolated graph curves. To compare these discrepancies, the RMSE method is used 
on all three interpolation approaches. The data set as a whole is divided into two comparable peaks and 
valleys. This deliberate data separation identifies any major alterations, which simplifies the examination of 
these discrepancies. The RMSE is used to compare these two groups, and the findings are summarized in 
Table 1. 


Table 1. Peak and valleys data comparison between the original and interpolated data using RMSE 


Original BOD 

Peak Valley 
PCHIP 0 0 
Spline 0.58 0.75 
Makima 0.13 0.094 


As seen in Table 1, there is a discrepancy between the actual BOD data and the interpolated BOD 
data. The results indicate that when the PCHIP technique interpolates the BOD data, the RMSE in the peaks 
and valleys is zero. Additionally, it is the most precise curve point compared to Spline and Makima. The 
RMSE value indicates that the Spline interpolation has an error of 0.58 and 0.75 at the graph’s peak and 
valley. Additionally, it can be shown that the Makima interpolation has an inaccuracy in their peak and valley 
graphs. Makima has a smaller RMSE than Spline interpolation. The Makima interpolation has a peak error of 
approximately 0.13 and a valley error of roughly 0.094. Between the Spline and Makima interpolations, the 
Spline interpolation has a minor mistake at the graph’s peak. In contrast, the Makima interpolation has a 
small error in the graph’s valleys. 

To determine the causes of the error values on the RMSE for Spline and Makima interpolation and 
graph form using PCHIP, data samples over a shorter period was taken from the overall interpolated data to 
visualize the graph curve formed using three different types of interpolation. The curves between the original 
BOD data points and the interpolated data points using PCHIP, Spline, and Makima are shown in Figure 7. 
This figure depicts the original data’s BOD data graph for 475" to 505" data points. This data range is used 
as a case study to illustrate the difference between the initial BOD data graph and the graph curve after 
interpolation. 
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Figure 7. The difference between the original BOD data curve and interpolated data curve 


Additionally, it can be seen that each of the three types of interpolation takes a unique way of 
constructing and facilitating the original data points. This is the primary reason for the technique’s disparate 
RMSE results. Graphs were created for each type of interpolation to help visualize the differences between 
each form of interpolation and the actual BOD data. The charts in Figures 8(a)-(c) depict the original data 
and three distinct forms of interpolation graphs. Interpolation is illustrated in Figure 8(a) using PCHIP, 
in Figure 8(b) using Spline interpolation, and in Figure 8(c) using Makima interpolation. 

Figure 8(a) illustrates that the PCHIP technique preserves the original data form as best possible. 
PCHIP interpolation follows the actual data line smoothly due to the system’s monotone character. The 
PCHIP approach minimizes oscillation effects while generating new data points along the original path. 
There will be no overshoot when the data curves undergo an abrupt change by strictly limiting the oscillation 
effect. This is necessary for the PCHIP to interpret the data comparable to the original BOD data possible. 
Figure 8(b) illustrates that the Spline interpolation technique exhibits strikingly different features than the 
PCHIP techniques. The Spline technique has a predisposition to identify the presence of a relatively strong 
swing following a curve’s direction change. This is common when a curve is fitted using the Spline fitting 
method [31]. Additionally, it is necessary to observe the incidence of overshoot on the curve side. This is 
since it produces data points with a high degree of smoothness. This may significantly affect the accuracy of 
data generated using the Spline interpolation technique [32]. This technique is well suited for interpolating 
data that demands smoothness on the data curve. Finally, Figure 8(c) illustrates the Makima interpolation 
method. This technique is quite similar to PCHIP interpolation. While Makima is not as aggressive as 
PCHIP, it can deal with oscillations in the flat portion of the data. Makima tends to flow precisely through 
flat regions while simultaneously decreasing the instability caused by a sudden change in angle. Makima 
makes a concerted effort to avoid peak overshoot and mitigate the danger of it occurring. 
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Figure 8. Graph curve from the sample taken from (a) PCHIP interpolation, (b) Spline interpolation, and 
(c) Makima interpolation 


The inconsistency in the time intervals between data collecting along the Klang River impairs the 
data’s validity and can obstruct other processes such as prediction and forecasting. By attempting to employ 
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the interpolation method, one can improve the overall expectation of the data. While interpolation techniques 
do widely used, it is necessary to analyze them using several approaches to ascertain their accuracy rates. 
Comparison only used three strategies using MATLAB-testable programs: PCHIP, Spline, and Makima. 
According to the research performed, PCHIP interpolation delivers the highest level of accuracy when 
compared to Spline and Makima interpolation when it comes to reproducing the original form of BOD data. 
This action is a necessary consequence of PCHIP’s aggressive attempt to preserve the actual shape of the 
original data. Makima is the second choice due to its less aggressive tendency to follow the original form. 
Still, it provides the best results in producing a curve almost identical to the original graph. It is more 
appropriate to utilize Spline interpolation when the margin of error from the original graph point is flexible. 
Spline will attempt to roll the curve when it encounters a rapidly shifting turn. These studies discovered that 
the differences between these three strategies to be pretty slight. While PCHIP interpolation provides the 
optimal answer, Makima and Spline interpolation are still viable options for filling in a group of missing data 
since they have their own identity and characteristics that satisfy the needs of the interpolation process itself. 
Finally, it is up to the user to determine the most appropriate technique for spatial interpolation based on their 
specific requirements. 


4. CONCLUSION 

Overall, this study has successfully improved the accuracy of the frequently used data for river 
water characteristics to determine river water quality by completing missing critical data for subsequent data 
processing. After the comparison between PCHIP, Spline, and Makima interpolation offered by MATLAB 
has shown different characteristics among them. The results shows that the PCHIP interpolation gives the 
best level of accuracy in producing a convincing interpolation data compared to the original BOD data. The 
PCHIP experiments results show very small error gaps between the original and the interpolated data. This is 
very important especially in keeping long term data as original as possible without losing too much data on 
the curves. 
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